twoDxcThis vignette demonstrates tools provided by twoDxc for analyzing
two-dimensional (2D) chromatography/mass spectrometry (MS) data. This package
relies on xcms and CAMERA to read and process MS data.
The example data file used will be an LC\(\times\)LC/MS run using a quadrupole-time-of-flight (QTOF) MS. The sample analyzed was a tea leaf extraction for a metabolomics study. The modulation time for the run was 60 seconds with approximately 30 seconds before 2D valve modulation began, specified as the dead time. Mass spectra were acquired at 2 Hz from 100 - 1700 m/z.
library(twoDxc)
#> Warning: multiple methods tables found for 'plot'
#> Warning: multiple methods tables found for 'impute'
#> Warning: replacing previous import 'xcms::plot' by 'graphics::plot' when loading
#> 'CAMERA'
library(dplyr)
library(future.apply)
library(ggplot2)
tea.file <- system.file('tea_data', 'tea_pool_1.mzML', package = 'twoDxc')
tea.data <- readMSData(tea.file, mode = 'onDisk')
xcms preprocessing2D chromatographic data can be processed the same as 1D data, except each compound can appear in multiple consecutive modulations. Below, the base peak chromatogram is plotted as if the data were 1D.
tea.bpi <- chromatogram(tea.data, aggregationFun = 'max')
plot(tea.bpi)
The chromatogram’s sawtooth pattern arises from the gradient separations performed at
fast, ultra high pressure conditions. We can see that at least two peaks from
caffeine elute in the modulations around 2700 seconds. The extracted ion
chromatogram (EIC) for caffeine can also be plotted to show multiple peaks per
compound. twoDxc provides a function for calculating m/z ranges given a
parts-per-million (ppm) tolerance. The MS used to acquire this data has a
tolerance of 20 ppm. Note the retention time range in the plot is adjusted.
mz.range.195 <- calc.mz.Window(195.0882, 20)
# m/z tolerance window for caffeine's accurate mass is:
print(mz.range.195)
#> [1] 195.0843 195.0921
eic.195 <- chromatogram(tea.data, mz = mz.range.195, rt = c(2500, 3000))
plot(eic.195)
Now peak detection will be performed on the data as if it were 1D with xcms.
cwp <- CentWaveParam(snthresh = 100, peakwidth = c(6, 15),
prefilter = c(5, 1000))
tea.peaks <- findChromPeaks(tea.data %>%
filterRt(rt = c(4.8 * 60, 60 * 60)),
param = cwp)
pdp <- PeakDensityParam(sampleGroups = 1, minFraction = 1, bw = 3)
tea.peaks <- groupChromPeaks(tea.peaks, pdp)
feature.chroms <- featureChromatograms(tea.peaks, features = 1:4)
par(mar = rep(2, 4))
plot(feature.chroms)
Notice the same ions are found about 60 seconds apart.
We will now group ions into pseudospectra using the CAMERA package.
tea.xsa <- xsAnnotate(as(tea.peaks, 'xcmsSet'))
# Group by fwhm
tea.xsaF <- groupFWHM(tea.xsa, perfwhm = 0.6)
#> Start grouping after retention time.
#> Created 562 pseudospectra.
# Label isotopes
tea.xsaI <- findIsotopes(tea.xsaF, mzabs = 0.01)
#> Generating peak matrix!
#> Run isotope peak annotation
#> % finished: 10 20 30 40 50 60 70 80 90 100
#> Found isotopes: 247
# Group by correlation
tea.xsaC <- groupCorr(tea.xsaF)
#> Start grouping after correlation.
#>
#> Calculating peak correlations in 562 Groups...
#> % finished: 10 20 30 40 50 60 70 80 90 100
#>
#> Calculating graph cross linking in 562 Groups...
#> % finished: 10 20 30 40 50 60 70 80 90 100
#> New number of ps-groups: 906
#> xsAnnotate has now 906 groups, instead of 562
# Find adducts
tea.xsaFA <- findAdducts(tea.xsaC, polarity = 'positive')
#>
#> Calculating possible adducts in 906 Groups...
#> % finished: 10 20 30 40 50 60 70 80 90 100
# Plot example spectrum
plotPsSpectrum(tea.xsaFA, pspec = c(1:4), maxlabel = 5)
Three of the first four pspectra are 60 seconds apart from each other and
exhibit similar spectra.
We will use twoDxc to group these spectra together. The
group2D function takes its arguments as: 1) the xsAnnotate object, 2) the
modulation time for the comprehensive 2D run, and 3) the dead time, or where the
time at which modulations start, to adjust the 2D retention time.
The algorithm groups features by their m/z, 1D RT, and 2D RT, with default tolerances of 20 ppm, 3 minutes, and 5 seconds, respectively. Matching features are grouped into one psg.2d group and their intensities summed. This way, we can make more accurate comparisons between different phenotypes and cut down on redundant features.
filter.Ion <- function(pspectra, ion, ppm = 20){
ion.range <- calc.mz.Window(ion, ppm)
filtered.pspectra <- as.data.frame(pspectra) %>%
filter(mz > ion.range[1] & mz < ion.range[2])
return(filtered.pspectra)
}
print(getpspectra(tea.xsaFA, grp = c(1:10)) %>%
filter.Ion(195.0882))
#> mz mzmin mzmax rt rtmin rtmax into
#> CP1243 195.0877 195.0873 195.0881 2698.261 2693.784 2704.230 3668398.2
#> CP1181 195.0876 195.0871 195.0883 2639.069 2635.587 2644.540 997533.8
#> CP1285 195.0878 195.0872 195.0881 2757.453 2751.982 2762.925 536909.1
#> intb maxo sn sample isotopes adduct psg
#> CP1243 3664248.6 1236855.8 2720 1 [M+H]+ 194.078686274894 1
#> CP1181 997349.6 366758.0 3244 1 3
#> CP1285 528198.2 194822.4 233 1 4
tea.xsa2D <- group2D(tea.xsaFA, 60, 30)
#> Added 8858 spectra
#> Condensing...
#> Grouped 492 2D pseudospectra
print(tea.xsa2D@pspec2D %>%
filter.Ion(195.0882))
#> mz mzmin mzmax rt rtmin rtmax rt.2d rtmin.2d
#> 1 195.0877 195.0871 195.0883 2698.261 2635.587 2762.925 28.261 21.982
#> rtmax.2d into isotopes adducts psgs psg.2d
#> 1 34.54 5202841 , , [M+H]+ 194.078686274894, , 1, 3, 4 1
Below follows the sample sample processing steps but for three replicates of the tea extraction.
teas.files <- dir(system.file('tea_data', package = 'twoDxc'),
full.names = T)[1:3]
teas.group.info <- data.frame(sample_name = sub(basename(teas.files),
pattern = '.mzML',
replacement = '',
fixed = T),
sample_group = c(rep('lvl_1', 3)),
stringsAsFactors = F
)
teas.data <- readMSData(teas.files, pdata = new('NAnnotatedDataFrame',
teas.group.info),
mode = 'onDisk')
xchr.multi <- findChromPeaks(teas.data %>%
filterRt(rt = c(4.8 * 60, 60 * 60)),
param = cwp)
pdp.multi <- PeakDensityParam(sampleGroups = teas.data$sample_group,
minFraction = (2/3), bw = 3)
xchr.multi <- groupChromPeaks(xchr.multi, param = pdp.multi)
xchr.multi <- fillChromPeaks(xchr.multi)
xsa.multi <- xsAnnotate(as(xchr.multi, 'xcmsSet'))
xsaF.multi <- groupFWHM(xsa.multi, perfwhm = 0.6)
#> Start grouping after retention time.
#> Created 381 pseudospectra.
xsaI.multi <- findIsotopes(xsaF.multi)
#> Generating peak matrix!
#> Run isotope peak annotation
#> % finished: 10 20 30 40 50 60 70 80 90 100
#> Found isotopes: 217
xsaC.multi <- groupCorr(xsaI.multi)
#> Start grouping after correlation.
#> Generating EIC's ..
#> Warning: Found NA peaks in selected sample.
#>
#> Calculating peak correlations in 381 Groups...
#> % finished: 10 20 30 40 50 60 70 80 90 100
#>
#> Calculating graph cross linking in 381 Groups...
#> % finished: 10 20 30 40 50 60 70 80 90 100
#> New number of ps-groups: 646
#> xsAnnotate has now 646 groups, instead of 381
xsaFA.multi <- findAdducts(xsaC.multi, polarity = 'positive')
#> Generating peak matrix for peak annotation!
#>
#> Calculating possible adducts in 646 Groups...
#> % finished: 10 20 30 40 50 60 70 80 90 100
plotPsSpectrum(xsaFA.multi, pspec = 1, maxlabel = 5)
group2D on multiple samplestimer <- proc.time()
xsa2D.multi <- group2D(xsaFA.multi, 60, 30)
#> Added 6778 spectra
#> Condensing...
#> Grouped 345 2D pseudospectra
proc.time() - timer
#> user system elapsed
#> 340.02 0.10 340.35
group2D can run in parallel if future.apply is installed. The below block
should demonstrate an improvement in processing time when data are grouped in
parallel.
plan(multiprocess)
timer <- proc.time()
xsa2D.multi.p <- group2D(xsaFA.multi, 60, 30, parallelized = T)
#> Added 6778 spectra
#> Condensing...
#> Grouped 345 2D pseudospectra
proc.time() - timer
#> user system elapsed
#> 76.53 11.75 214.00
identical(xsa2D.multi, xsa2D.multi.p)
#> [1] TRUE
Visualizing 2D chromatography data is easier as a heatmap plot, in which the x-axis shows 1D RT, the y-axis shows 2D RT, and the z-axis (color) shows ion intensity.
plot2D(tea.data, file = 1, 60, 30)
Because the total ion chromatogram (TIC) is a combination of intensities of all
ion channels, this information underrepresents the chemical diversity in this
metabolomics sample. Below is an example of an extracted ion chromatogram (EIC)
for ion 195, representing caffeine.
plot2D(tea.data, file = 1, mod.time = 60, dead.time = 30, ion = 195.0882)
It may be useful to iterate through the m/z acquisition range and generate EICs for each ion. The example code below would do so for m/z range 100 to 108. The upper range for the loop can be adjusted to the max m/z recorded.
data.mz.range <- c(round(min(tea.data@featureData@data$lowMZ)):108)
eics.2d <- lapply(data.mz.range, function(x){
plot2D(tea.data, file = 1, mod.time = 60, dead.time = 30, ion = x,
mz.tol = 'abs', save.output = F)
})